Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression

نویسندگان

Diandra Fabre

Thomas Hueber

Pierre Badin

چکیده

This paper presents a method for automatically animating the articulatory tongue model of a reference speaker from ultrasound images of the tongue of another speaker. This work is developed in the context of speech therapy based on visual biofeedback, where a speaker is provided with visual information about his/her own articulation. In our approach, the feedback is delivered via an articulatory talking head, which displays the tongue during speech production using augmented reality (e.g. transparent skin). The user’s tongue movements are captured using ultrasound imaging and parameterized using the PCA-based EigenTongue technique. Extracted features are then converted into control parameters of the articulatory tongue model using Gaussian Mixture Regression. This procedure was evaluated by decoding the converted tongue movements at the phonetic level using an HMM-based decoder trained on the reference speaker's articulatory data. Decoding errors were then manually reassessed in order to take into account possible phonetic idiosyncrasies (i.e. speaker / phoneme specific articulatory strategies). With a system trained on a limited set of 88 VCV sequences, the recognition accuracy at the phonetic level was found to be approximately 70%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic classification of Non-alcoholic fatty liver using texture features from ultrasound images

Background: Accurate and early detection of non-alcoholic fatty liver, which is a major cause of chronic diseases is very important and is vital to prevent the complications associated with this disease. Ultrasound of the liver is the most common and widely performed method of diagnosing fatty liver. However, due to the low quality of ultrasound images, the need for an automatic and intelligent...

متن کامل

Can tongue be recovered from face? the answer of data-driven statistical models

This study revisits the face-to-tongue articulatory inversion problem in speech. We compare the Multi Linear Regression method (MLR) with two more sophisticated methods based on Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs), using the same French corpus of articulatory data acquired by ElectroMagnetoGraphy. GMMs give overall results better than HMMs, but MLR does poorly. GMMs a...

متن کامل

Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface

This paper presents recent developments on our “silent speech interface” that converts tongue and lip motions, captured by ultrasound and video imaging, into audible speech. In our previous studies, the mapping between the observed articulatory movements and the resulting speech sound was achieved using a unit selection approach. We investigate here the use of statistical mapping techniques, ba...

متن کامل

Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data.

Individuals with larynx (vocal folds) impaired have problems in controlling their glottal vibration, producing whispered speech with extreme hoarseness. Standard automatic speech recognition using only acoustic cues is typically ineffective for whispered speech because the corresponding spectral characteristics are distorted. Articulatory cues such as the tongue and lip motion may help in recog...

متن کامل

Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training

Silent speech recognition (SSR) converts non-audio information (e.g., articulatory information) to speech. SSR has potential to enable laryngectomees to produce synthesized speech with a natural sounding voice. Despite its recent advances, current SSR research has largely relied on speaker-dependent recognition. High degree of variation in articulatory patterns across different talkers has been...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression

نویسندگان

چکیده

منابع مشابه

Automatic classification of Non-alcoholic fatty liver using texture features from ultrasound images

Can tongue be recovered from face? the answer of data-driven statistical models

Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface

Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data.

Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training

عنوان ژورنال:

اشتراک گذاری